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Abstract.— The application of molecular sequence data to studies on the phytogeny of the Hymenoptera are reviewed, with 
special attention given to the relationships among the higher levels of the Order. Methods for obtaining sequence information 
from nuclear-encoded ribosomal RNA (rRNA) and mitochondrial rRNA and protein-coding genes are described. Techniques 
for alignment and phylogenetic analysis of sequences are discussed, as are issues associated with the selection of outgroups. 
Recent molecular investigations of hymenopteran phytogeny at several taxonomic levels are discussed to illustrate the 
application of methods and analytical procedures. 



The use of DNA sequence data for systematics 
is recent and controversial. The controversies are 
not about whether nucleotide sequences are ap- 
propriate for reconstructing phylogenetic history 
but rather, how they should be used . Therefore, the 
springboard for our review is not a justification of 
the relative merits of sequence data over the appli- 
cation of other techniques for phylogenetic analy- 
sis (for this see Hillis and Moritz 1990), instead we 
begin with a discussion of the areas of controversy 
that have arisen with the use of DNA sequences for 
phylogenetic analysis. We review each of these 
issues and make recommendations based in part 
on our own experiences with collecting and ana- 
lyzing DNA sequences of Hymenoptera. 

Differences of opinion have arisen over aspects 
of sequence data collection and analysis, including 
(1 ) the appropriate genes (or gene fragments) to be 
sequenced and their use for different levels of 
inference; (2) methods of data acquisition; (3) 
methods of alignment, character weighting, and 
tree-building; (4) assumptions (or the lack thereoO 
of the models of nucleotide evolution; (5) consider- 
ation of molecular secondary structure and the 
degree to which it can bias interpretation of se- 
quence data for phylogenetic reconstruction; and 
(6) appropriate statistical analyses for estimating 
the reliability of molecular phylogenies. Each of 
these issues confronts all systematists who wish to 
approach phylogenetic reconstruction from a mo- 
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lecular perspective, altogether a non-trivial pur- 
suit for beginner and experienced alike. 

This paper arose from the symposium 'Phytog- 
eny of the Hymenoptera', which was featured dur- 
ing the 2nd Quadrennial meeting of the Interna- 
tional Society of Hymenopterists, held in August, 
1991 in Sheffield, England. Three contributions in 
the symposium presented results of phylogenetic 
analyses using DNA sequences. It become clear at 
this meeting that many of our audience were unfa- 
miliar with the use of sequence data for systematic 
studies. In the future, systematists will have to 
interpret critically the results from molecular 
studies in order to compare them effectively with 
their own investigations based on morphology or 
other types of data. Therefore, we thought it 
worthwhile to review the subject of molecular 
phylogeny with particular reference to the Hym- 
enoptera. To remain faithful to the theme of the 
symposium, we primarily restrict our discussion 
in this review to questions of higher level phylog- 
eny, that is, to the tribal level or above. However, 
we include a single study of relationships at the 
species level. Given that little has been published 
on comparative DNA sequences for phylogenetic 
reconstruction of the Hymenoptera (but see 
Cameron 1991; Garnery et al. 1991; Sheppard and 
McPheron 1991), we rely heavily on our own in- 
vestigations of sequence comparisons of the small 
(18S) subunit ribosomal RNA gene (rRNA) and the 
large (16S) rRNA gene encoded by the mitochon- 
drial genome (mtDN A). For a general review of the 
field of molecular systematics we recommend two 
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Fig. 1. Generalized diagram illustrating the components of the nuclear rRNA repeat unit (in this case that of vertebrates, after 
Gerbi 1985), showing the relative positions of the 5.8S, 18S and 28S regions, nontranscribed spacers (NTS), an external 
transcribed spacer (ETS) and two internal transcribed spacers (ITS). 



excellent books by Hillis and Moritz (1990) and 
Miyamoto and Cracraft (1991); for reviews of mo- 
lecular techniques and applications for insect sys- 
tematics see Simon (1991) and Simon et al. (1990, 
1991 ). Ladiges and Martinelli (1990), though focus- 
ing on plant systematics, also contains a number of 
useful general papers on both theoretical and 
practical aspects of molecular systematics. 

CLASSES OF DNA FOR 
PHYLOGENETIC ANALYSIS 

In the last decade, the application of molecular 
data to systematics has expanded enormously, and 
comparative DNA sequences have become the 
preferred data for such investigations (Hillis and 
Moritz 1990; Miyamoto and Cracraft 1991). Se- 
quence characters from nuclear and extranuclear 
genomes offer a more or less unlimited supply of 
diverse characters applicable for analyses at all 
taxonomic levels, from the population to the 
Kingdom. Different genes and gene regions exhibit 
vastly different evolutionary rates, structural or 
functional constraints, and mutational biases (Nei 
1987; Larson and Wilson 1989; Simon et al. 1991), 
thus it is potentially possible to match specific 
systematic questions to appropriate genomic re- 
gions for analysis. For example, regions of DNA 
that are evolutionarily conserved, such as sections 
of rRNA (Gerbi 1985), are useful for resolving early 
phylogenetic history (Field et al.l988; Lake 1988; 
Mindell and Honeycutt 1990), whereas regions 
showing intermediate (Larson and Wilson 1989) or 
rapid divergence (Brown et al. 1979; Crozier et al. 
1989) are useful for evaluating evolutionary events 
that occur on intermediate (Larson 1991; Cameron 
1991) or short (Greenberg et al. 1983) time scales. 
Nuclear rRNA sequences have been used exten- 
sively for the phylogenetic reconstruction of a great 
diversity of organisms, and more recently, mito- 
chondrial rRNA and protein-coding genes have 
contributed even more sequence information 
(Simon et al. 1991 ). We briefly review each of these 



classes of DNA. 

Nuclear encoded rRNA . — Ribosomes are the sites 
for cellular protein synthesis and as such their 
RNA is present in many copies and is abundant 
compared with cellular m^A and tRNA. Eu- 
karyote rRNA is composed of two subunits; the 
smaller subunit has a sediment coefficient of about 
18 and is known as 18S rRNA, while the larger 
subunit comprises three components, viz. 5S, 5.8S 
and 28S rRNA (Fig. 1). Sequences from the smaller 
components (5S and 5.8S) are generally inappro- 
priate for phylogenetic analysis because of their 
size, but the intermediate and larger subunits, par- 
ticularly 18S rRNA, have been used to examine 
relationships among a great range of taxa (Johnson 
and Baverstock 1989; Mindell and Honeycutt 1990; 
Baverstock and Johnson 1990; Larson 1991). The 
18S rRNA is 1700 to 2300 bases long in eukaryotes 
and as a non-coding region, its insertions and 
deletions can comprise any number of bases (not 
limited to multiples of three) because frame shifts 
do not apply. Furthermore, comparison of se- 
quences indicates that introns are generally absent 
in rRNA (Baverstock and Johnson 1990). 

Some regions of 18S rRNA are moderately 
variable and have application for lower levels of 
phylogenetic analysis. However, the more con- 
served regions have been the focus of many higher- 
level studies. Indeed, so called 'fossil RNA' exhibits 
identical sequences (24 bases in length) between 
organisms as divergent as prokaryotes (e.g., 
archaebacteria) and eukaryotes (e.g., humans). 
Generally, 18S rRNA sequences are considered 
useful for taxa that diverged from 100-1000 Mya 
(Baverstock and Johnson 1990), while 28S rRNA 
sequences are useful for divergent times of 60-200 
Mya (Larson 1991). Studies to date (Table 1) have 
examined the relationships between Kingdoms, 
major prokaryote groupings, protistan phyla, in- 
vertebrate phyla, classes of platyhelminths, chor- 
date groups, and vertebrates. Few studies have 
been published on the phylogeny of insect groups 
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Table 1. Selected references to studies employing rRNA 
sequence data for phylogenetic analysis and the 
corresponding taxa examined. 



Reference 


Region 


Taxon 


Nuclear 5S, 5.8S, 18S and 28S rRNA 




Walker 1985 


5S & 5.8S 


Protistan phyla 


Woese et al, 1990 


18S 


Kingdoms 


Fox etal. 1980 


18S 


Major prokaryote groups 


Woese 1987 


18S 


Major prokaryote groups 


Johnson & Baverstock 1989 
(review) 


18 


Sprotistan phyla 


Field et al. 1988 


18S 


Invertebrate phyla 


Baverstock et al.l991a 


18S 


Platyhelminths 


Quetal. 1986 


18S 


Helminths 


Wheeler 1989 


18S 


Insect orders 


Joss et al. 1991 


18S 


Chordate groups 


Baverstock et al. 1991b 


18S 


Higher vertebrates 


Jupe et al. 1988 


18S 


Algae 


Mindell & Honeycutt 1989 


18S/28S 


Birds 


Hedges et al. 1990 


18S/28S 


Tetrapods 


Hamby & Zimmer 1988 


18S/26S 


Grasses 


Zimmer et at. 1989 


18S/26S 


Flowering plants 


Baroin et al. 1988 


28S 


Unicellular eukaryotes 


Vossbrinck & Friedman 1989 


28S 


Diptera 


Larson 1991 


28S 


Salamander 


Hillis & Dixon 1989 


28S 


Vertebrates 


Hillis & Davis 1987 


28S 


Amphibians 


Schmicket et al. 1990 


28S 


Primates 


Sheppard & McPheron 1991 


18S/28S 


Apidae 


Nuclear intergenic spacer (IGS) of rRNA 


Collins et al. 1987 




Anopheles (Diptera) 


Beach et at. 1989 




Anopheles (Diptera) 


Collins et al. 1989 




Anopheles (Diptera) 


Tautz et al. 1987 




Drosophila (Diptera) 


Lassner et al. 1987 




Trilicum (wheat) 


Sheppard & McPheron 1991 


ITSl 


Apidae 


Mitochondrial 12S and 16S rRNA 




Cameron 1991 


16S 


Apidae 


Derr et al., in press 


16S 


Hymenoptera 


Thomas et al, 1989 


12S 


marsupials 


Hixson & Brown 1986 


12S 


primates 


Miyamoto & Boyle 1989 


12S/16S 


eutherian mammals 


Miyamoto et. al. 1989 


12S/16S 


artiodactyl mammals 


Simon et al. 1990 


12S 


cicadas 


Sheppard & McPheron 1991 


12S (very divergent) Apidae 



using nuclear encoded rRNA sequences; excep- 
tions include Vossbrinck and Friedman (1989) on 
cylorrhaphous Diptera , Wheeler (1989) on the 
Insecta, Sheppard and McPheron (1991 ) on Apidae, 
Also, a recent analysis of the blattoid insects has 
been completed by Vawter (1991 Ph.D. dissertation). 

Mitochondrial DNA. — Animal mtDNA is a cir- 
cular, double-stranded molecule ranging from 
about 14 kb to 39 kb in length (reviewed in A vise et 



al. 1987; Moritz etal. 1987). The mtDNA of only one 
insect. Drosophila yakuba, has been completely se- 
quenced (Clary and Wolstenholme 1985). It encodes 
13 proteins, 22 tRNAs and two rRNAs, as for most 
animals. Partial mtDNA sequences are known for 
other insects, including crickets (Rand and Harrison 
1989), mosquitoes (HsuChen and Dubin 1984; 
HsuChen et al. 1984), cicadas (Simon et al. 1990), 
and honey bees (Vlasak et al. 1987; Crozier et al. 
1989; Garnery et al. 1991; Cameron, unpublished 
data). For a complete list of published mtDNA 
sequences see Simon et al. (1991). 

In vertebrates, mtDNA has been found to evolve 
many times faster than single-copy nuclear DNA 
(scnDNA) (Moritz et al. 1987), in contrast to inver- 
tebrates, which exhibit approximately equal rates 
of change (amino acid or nucleotide substitutions) 
for both genomes (Vawter and Brown 1986; Powell 
et al. 1986). In the Hymenoptera, some mtDNA 
genes are highly conserved (e.g. ND3, Les Willis, 
unpublished data for Apis), while others exhibit 
rapid rates of divergence (e.g., COII, Crozier et al. 
1989; Garnery et al. 1991; rRNA, Cameron, unpub- 
lished data; Derr et al., in press). Within both the 
12S and 16S rRNA genes, some regions are highly 
conserved while other regions are rapidly diverg- 
ing (Cameron, unpublished data; Derr et al., in 
press). Recent investigations of honey bee mtDNA 
indicate that it has a significantly greater evolu- 
tionary rate than that of Drosophila (Crozier 1989); 
however the causal factors are unknown. 

In summary, current knowledge suggests that 
hymenopteran mtDNA exhibits vastly different 
rates of evolution, and therefore is useful for phy- 
logenetic inference at many levels. A note of cau- 
tion, however, when examining relationships be- 
low the genus level. The random sorting of poly- 
morphic genes within a species may lead to a lack 
of congruence between the phylogenetic pattern of 
the gene (mtDNA) and that of a group of closely 
related species (Takahata 1989). We discuss below 
(Examples of Current Research) the usefulness of 
comparative sequences from the 16S rRNA gene 
for assessing phylogenetic relationships at three 
different levels: (1) among species of the genus 
Apis, (2) among tribes of the family Apidae), and (3) 
among families and superfamilies of Hymenoptera. 

Protein-coding genes. — Protein-coding genes 
(also referred to as structural genes) are transcribed 
into RNA and then translated into proteins. These 
have been used less often for phylogenetic analy- 
ses, in part because early on, rRNA genes (mito- 
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chondrial and nuclear) proved useful for many 
levels of phylogenetic inference (see above). Thus, 
a large number of rRNA primers have been syn- 
thesized, many of which are applied to new studies 
utilizing sequence data. Fewer primers are available 
for protein-coding genes. An additional concern is 
that many nuclear encoded protein-coding genes 
are parts of multiple-copy divergent gene families, 
making analysis (and possibly PCR) more compli- 
cated. However, the application of mitochondrial 
protein-coding sequences for phylogenetic studies 
of Hymenoptera is expanding. In addition to the 
study of Apis relationships by Garnery et al. (1991), 
full sequences are available for the mitochondrial 
COI and COII genes of A. meUifera L.(Crozier 1989). 
These have been used to synthesize primers for 
several current phylogenetic investigations, in- 
cluding another analysis of the genus Apis (Les 
Willis, unpublished data). For a review of current 
knowledge on mitochondrial protein-coding genes, 
including primer sequences used for PCR and 
sequencing, see Simon et al. (1991). 

OBTAINING SEQUENCE DATA FOR 
PHYLOGENETIC ANALYSIS 

Although the usefulness of nucleotide sequences 
for phylogenetic analysis has become widely rec- 
ognized, the need for technical training in molecu- 
lar biology, and the time and expense involved in 
obtaining the data has curtailed widespread use of 
the technology for systematics. This has been par- 
ticularly true for Hymenoptera and other insect 
groups, which are small relative to vertebrates, 
presenting challenges for extracting DNA in suf- 
ficient quantities for sequencing. In addition, many 
Hymenoptera, especially aculeates, have a hard 
chitinous exoskeleton which, in contrast to the soft- 
bodied Drosophila, makes DNA extraction more 
difficult. 

These problems have, in principle, been solved 
by the revolutionary new development of auto- 
mated technology for the enzymatic amplification 
of DNA based on the polymerase chain reaction 
(PCR) (Saiki et al. 1988; Innis et al. 1990). PCR is a 
thermocyclic reaction (discussed below) that gen- 
erates multiple copies of a fragment of DNA rela- 
tively quickly and cheaply, eliminating the lengthy 
procedures of viral or bacterial cloning (Saiki et al. 
1985; Mullis et al. 1986; Mullis and Faloona 1987; 
Cherfas 1990; Innis et al. 1990; Mullis 1990). Al- 
though PCR is still in its infancy as a tool for 



systematics, this method now makes it feasible to 
obtain large quantities of homologous DNA for 
direct sequencing from individual insects (Wheeler 
1989; Simon et al. 1991; Cameron 1991). Because 
small amounts of template DNA are sufficient for 
amplification with PCR, samples no longer must 
be fresh or frozen, they may be preserved in alcohol 
or formalin, or even dried (Paabo 1989; Paabo 1990; 
Kocher et al. 1989). Thus, PCR has the capacity to 
expand phylogenetic investigations to include 
untapped temporal and geographic coverage of 
museum specimens. 

PCR works with two oligonucleotide primers, 
which are short pieces of DNA in the range of 18- 
25 base pairs (bp) in length (see Appendix 1). Each 
primer is designed to be complementary to one of 
the two strands of the sample DNA, and together 
they flank the region to be amplified, which is 
usually several hundred to several thousand base 
pairs (kilobases or kb) in length. PCR occurs in 
three steps, repeated 30-40 times. First, the sample 
DNA is denatured by heat into its two respective 
strands. Next, the reaction mixture is cooled to 
allow the two primers to anneal to their comple- 
mentary strands. Lastly, in the presence of a ther- 
mostable DNA polymerase, such as Taq polymerase 
(derived from a thermophilic bacterium), the two 
complementary sample strands are replicated by 
primer extension, beginning at the primer sites (for 
figured descriptions see Hillis et al. 1990; Simon et 
al.l991). The target DNA is therefore replicated 
exponentially, and within several hours thedouble- 
stranded sample has been amplified several 
millionfold. Single-stranded DNAcan be produced 
by using an excess of one of the primers, a proce- 
dure known as asymmetric amplification 
(Gyllensten and Erlich 1988). 

The procedures and protocols for DNA extrac- 
tion, amplification, purification and sequencing 
(modified for Hymenoptera) are too extensive to 
present here and will be published elsewhere 
(Cameron, unpublished data; Derr et al., in press). 
However, several recent references provide useful 
information: Hillis et al. (1990) describe a basic 
laboratory setup, protocols, and recipes for stock 
solutions; Innis et al.(1990) provide a thorough 
description of PCR methodology and its various 
applications and protocols; Simon et al. (1991 ) pro- 
vide up to date information on invertebrate mito- 
chondrial (and other) primer sequences for use 
with PCR, as well as PCR protocols for use with 
insect taxa; and Maniatis et al. (1982) is an indis- 
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Fig. 2. Diagrammatic representation of the sequencing of small and large subunit rRN A using the reverse transcriptase method. 
In this case for the 18S rRNA, primer A is added to a bulk RNA extract; it hybridizes to the complementary region and acts to 
prime the intitiation of DNA sythnesis in a Sanger sequencing reaction (see text for further details) (after Johnson and 
Baverstock 1989). 



pensable reference for many molecular procedures. 

Another method of obtaining DNA for se- 
quencing is to isolate the transcribed DNA for a 
region of RNA (e.g., mRNA) and use reverse tran- 
scriptase for sequencing. This method, developed 
by Qu et al. (1983) and Lane et al.(1985) for obtain- 
ing sequences of RNA, provides a relatively quick 
and easy method of sequencing by using small 
conserved regions to prime reverse transcription 
of cellular RNA. In short, sample tissues are treated 
with guanidine hydrochloride to block RNAase 
activity. Bulk RNA, consisting primarily of rRNA, 
is then purified from DNA and protein (see Larson 
and Wilson 1989; Hillis et al.l990). One of several 
oligonucleotide primers (e.g.. Field et al. 1988; 
Baverstock et al. 1991a) is then added to the puri- 
fied RNA and the DNA is then sequenced by chain 
termination (Sanger et al. 1977) at the same time 
that it is produced by reverse transcription. The 
resultant products are then run on a sequencing gel 
and the sequences read from an autoradiograph 
(Fig. 2). The entire procedure takes only days in- 



stead of the months required for cloning DNA. 

ANALYSIS OF SEQUENCE DATA 

Sequence Comparison and Alignment . — Probably 
the most difficult and least understood aspect of 
the use of sequence data for phylogenetic analysis 
is sequence alignment (Swofford and Olsen 1990). 
Phylogenetic analysis of sequence information re- 
quires the correct alignment of homologous com- 
ponents between pairs of sequences. One must be 
careful to distinguish between phylogenetically 
homologous DNA sequences (orthologs) and 
multiple, diversified gene copies within single in- 
dividuals (paralogs) (Fitch 1970, Patterson 1987). 
In phylogenetic studies, the goal is to compare 
orthologous sequences from taxa of interest. Su- 
perficially, this would seem a rather straightfor- 
ward task. For example, each nucleotide position 
can be viewed as a 'character' with only a limited 
number of 'states' possible at each position (i.e., for 
DNA sequences, 'A', 'C', 'G', 'T', or a gap mutation 
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Suborder 


Superfamilv 


Genus 


Nucleotide Seauence 


Symphyta 


Siricoidea 


Tremex 


AA-ATATAAATTAAATTCT- 


Apocrita 


Vespoidea 


Polistes 


AA-AACATTTTTAAATTCT- 


Apocrita 


Ichneumonoidea 


Xanthopimpla 


ATTAATA-AATTAAA-GCTC 



Fig. 3. One possible sequence alignment from a small segment of the large ribosomal subunit (1 6S rRN A) of three representative 
hymenopteran taxa. This region corresponds to positions 13,205 to 13,225 of the published Drosophila sequence (Clary and 
Wolstenholme 1985) (data are from Derr et al., in press). From a total of 20 nucleotide positions from three taxa there are six 
inferred gap mutations, only six G or C bases, and 48 A or T bases. There are 10 possible base substitutions of which nine are 
transversions (involving A/T, A/C or G/T bases) and only one transition (T/C). 



In practice, however, alignment of multiple 
nucleotide sequences can involve a number of 
complicating factors. For example, as discussed by 
Swofford and Olsen (1990), in addition to requiring 
the use of orthologous sequences, phylogenetic 
analysis of sequence data requires that one make 
the assumption that all nucleotides observed at a 
given position are traceable to a common ancestor. 
Historical events such as insertions, deletions, 
duplications, rearrangements, and multiple 
nucleotide substitutions all combine either to cloud 
the evolutionary history of some nucleotide posi- 
tions or make non-homologous positions indistin- 
guishable. 

Determination of sequence homology and 
alignment usually presents few ambiguities when 
working with protein-coding gene sequences, par- 
ticularly scnDNA. Landmark features along these 
sequences such as codons (three adjacent bases 
specifying an amino acid), intron/exon junction 
consensus sequences, various start and stop signals, 
and other DNA/ protein conserved binding sites 
provide clues that make alignment of these se- 
quences straightforward. These landmark features 
are especially useful for alignment when very 
distantly related taxa are compared. Moreover, 
positions within each codon tend to evolve at dif- 
ferent rates, with third position changes being 
most frequent, first position changes being highly 
conserved, and second position changes some- 
where in between. Therefore, the reading frames in 
protein coding sequences provide an inherent 
structure useful in alignment. 

Nonprotein-coding regions, such as rRNA, 
tRNA and other non-translated sequences are po- 
tentially more difficult to align with distantly re- 
lated taxa, due in part to the lack of these landmark 
features. In addition, these sequences usually are 



characterized by nucleotide base insertion and 
deletion events, presumably because there are no 
selective constraints to maintain reading frames 
that code for specific amino acids (Mindell 1991). In 
practice, however, the alignment of most nuclear- 
encoded rRNA sequences by eye does not seem to 
have posed significant problems because of the 
relatively small number of insertions/deletions 
and the general conservative nature of the sub- 
units. Some regions of mitochondrial 16S rDNA 
may pose alignment problems in Hymenoptera 
because they exhibit an unusually high frequency 
of A's and T's relative to G's and Cs (Cameron, 
unpublished data; Derr et al. in press). Conse- 
quently, nucleotide substitutions in these areas 
may be characterized by a high proportion of 
transversions (purine (A/G) to pyrimidine (T/C) 
substitutions, or the reverse) as opposed to the bias 
toward transitions (purine - purine or pyrimidine 
- pyrimidine) commonly observed in vertebrate 
mitochondrial genomes (Hixson and Brown 1986; 
Thomas and Beckenbach 1989). This becomes im- 
portant when using computer alignment schemes 
(discussed below), which generally assign higher 
penalties to transversion substitutions. Also, con- 
siderable length polymorphism is evident in these 
AT-rich regions; large insertions and deletions 
(often greater than 10 base pairs in length) can 
further complicate alignment because of uncertain 
homology among the bases. It is best to exclude 
these hypervariable regions from the analysis. Se- 
quences from the 16S rRNA region are depicted for 
three hymenopteran taxa in Fig. 3, taken from the 
study of Derr et al. (in press). These provide ex- 
amples of insertion/deletion events, strong A/T 
base compositional bias, and a correspondingly 
high rate of transversion over transition substitu- 
tions. Asa consequence of these factors, most of the 
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difficulties encountered in aligning nucleotide se- 
quences involve nonprotein-coding regions. 

Algorithms designed to determine the optimal 
alignment between two sequences have been 
available for some time (Needleman and Wunsch 
1970; Sankoff 1972; Sellers 1974; Waterman et al. 
1976). These programs attempt to maximize the 
number of matches or to minimize the number of 
substitutions, insertions, or deletions required to 
make two sequences equivalent (Mindell 1991). 
However, extending this approach to more than 
two sequences with no a priori regard to their phy- 
logenetic relationships has been deemed inappro- 
priate for reconstructing phytogenies (Hein 1989, 
1990b; Fengand Doolittle 1987, 1990). Theseauthors 
contend that in multiple alignments the initial choice 
of sequences for pairwise comparison can bias the 
final alignment, result in an excess of inferred gap 
events, and even affect phylogenetic results (Lake 
1991). Therefore, multiple sequence alignment, at 
least in principle must fall under the same con- 
straints used to infer phytogeny, in this case, global 
parsimony or minimizing the overall number of 
substitution and gap events. Alignment of se- 
quences could be considered as part of phytogeny 
inference, rather than as an independent analysis 
(Sankoff et al. 1973). Moreover, phylogenetic con- 
gruence with other independent data sets offers a 
means of choosing among equally parsimonious 
alignments (Hillis et al. 1990). 

Alignment of nucleotide sequences may be ac- 
complished by hand and computer. Several 'pro- 
gressive alignment' computer programs are cur- 
rently available (e.g., Higgins and Sharp 1988; 
Higgins et al. 1992; Hein 1989). These programs 
generally proceed by: (1) calculating an initial 
similarity value for each pairwise comparison of 
sequences; (2) constructing a dendrogram by cluster 
analysis using the matrix of these values; and (3) 
aligning the sequences according to the branching 
order in the dendrogram. Alignment scores are 
calculated by assigning positive or negative values 
to matches and mismatches and by imposing 
penalties for both the insertion of gaps and for each 
additional change within a gap. In most cases the 
user may assign a numerical value for each of these 
penalties. Aligned sequences may then be analyzed 
phylogenetically using any of the currently avail- 
able parsimony-based computer packages. Mea- 
sures of homoplasy in the results can also be used 
to discriminate among various sequence align- 
ments. In practice, results of computer alignment 



procedures should always be compared with those 
obtained by hand, and we have found (Cameron, 
unpublished data; Derr et al., in press) that final 
computer alignments can be fine-tuned by visual 
inspection. 

At present our understanding of the complexities 
of sequence comparison and analysis is still in- 
complete but developing rapidly. Our intent here 
has been to highlight the problems inherent in 
multiple sequence alignment as it relates to phylo- 
genetic reconstruction, and to indicate some 
methods available for their solution. In general, 
sequence alignment is straightforward when deal- 
ing with single-copy protein-coding sequences; 
with non-protein coding sequences the researcher 
should be aware that in areas with few conserved 
landmark features, sequence alignment can present 
a number of experimental challenges. Fortunately, 
as the field of molecular systematics continues to 
evolve and as more comparative sequence data 
becomes available, these challenges will be met by 
the development of increasingly useful and realis- 
tic computer alignment algorithms. For information 
regarding the algorithms discussed here, refer to 
the work of Sellers (1974), Smith et al. (1981, 1985), 
Feng and Doolittle (1987, 1990). For further infor- 
mation on sequence alignment, homology, and 
weighting schemes see Mindell (1991); for general 
reviews see Bell and Marr (1989), Doolittle (1990), 
Hillis et al. (1990), Hein (1989), and Watermann et 
al. (1991). 

Phylogenetic Analysis of Sequence Data . — Many 
methods have been proposed for reconstructing 
phylogenetic relationships with DNA sequence 
data for three or more taxa, and we do not propose 
to review them all here. Swofford and Olsen (1990) 
and Felsenstein (1988) provide excellent recent 
reviews of distance, maximum likelihood, and 
parsimony methods, and comment on both the 
logical foundations of various approaches and the 
'nuts and bolts' issues of actually getting the job 
done. Of the various approaches currently in use, 
we favor a simple parsimony model for reasons of 
simplicity and clarity, both in analysis and in the 
interpretation of results. With correctly aligned 
sequences, parsimony analysis is relatively 
straightforward. Each nucleotide position is treated 
as an independent, unweighted character with four 
possible states: adenine or guanine (purines) or 
cytosine or thymine (pyrimidines). The simplest 
approach is to treat a substitution from one base to 
any other as equally likely ('Fitch parsimony', Fitch 
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1971) and this can be accommodated by treating 
the characters as 'unordered' or 'non-additive' 
(terminology differs between programs). However, 
because of structural constraints on the DNA mol- 
ecule itself, a bias toward transitions (purine-purine 
or pyrimidine-pyrimidine) and against 
transversions (purine-pyrimidine or the reverse) 
has often been noted (e.g. Li et al. 1984, Hixson and 
Brown 1986). Some programs offer tools to accom- 
modate differential weighting of some character 
state changes over others. For example, the 'Step 
Matrix' function in PAUP (Swofford, 1990) can be 
used to assign any integer weight for changes 
between any two character states. Of course, the 
problem is to determine what weights to assign. In 
highly AT-rich sequences such as are found in 
many Hymenoptera, many changes necessarily 
will be from A to T or the reverse (transversions), 
and there may not be a bias towards transitions. It 
is possible, at least in theory, to examine empiri- 
cally the base composition of sequences and to 
derive from these the expected probabilities for the 
different categories of substitution. Swofford and 
Olsen (1990) make a sensible suggestion: by giving 
only slightly lower weight to transversions, the 
weights will come into play only in choosing be- 
tween essentially equally parsimonious solutions, 
and transitions will then be given the edge. 

A strategy to reduce the effects of homoplasy 
with sequence data from protein-coding genes is to 
eliminate nucleotides in the third position of each 
codon, or to give them a lower weight. This is based 
on the redundancy of the DNA code; that is, in 
most cases the first two positions of the code are 
sufficient to specify an amino acid and the third 
may be redundant information. As a result, sub- 
stitutions in third positions may accumulate more 
rapidly than in the other positions. If more than one 
substitution has taken place, the position is no 
longer informative. Although this approach is 
usually limited to protein-coding genes, in an 
analogous fashion, if the secondary structure of 
non-protein-coding sequences is known, regions 
shown to be undergoing compensating substitu- 
tions can be eliminated (Wheeler and Honeycutt 
1988), or preferably, given an appropriate lower 
weight (Vawter, 1991). 

For analysis of small data sets, any up to date 
computer algorithm for parsimony analysis will 
suffice bu t we recommend using a recent version of 
one of the readily available algorithms such as 
PAUP (Swofford 1990) or Hennig86 (Farris 1988). 



For studies with fewer than 15-20 terminal taxa, 
one of the exact methods can be used (branch and 
bound, exhaustive search, or implicit enumera- 
tion), and one can be confident that the most 
pasimonious tree or trees have been found. For 
larger datasets or those with relatively high levels 
of homoplasy, heuristic search procedures such as 
branch-swapping will be required. In such cases, it 
is important to try many different addition se- 
quences and search procedures, until one's patience 
has literally been exhausted, because it is often 
difficult to escape local optima in which the algo- 
rithms become trapped, or to find all of the differ- 
ent groups (or 'islands') of equally parsimonious 
solutions (Maddison 1991). 

A problem that is more or less unique to sequence 
data is how to handle insertion and deletion events, 
for example, as inferred by alignment procedures. 
A conservative approach is to treat gaps in se- 
quences as missing data. In this case they will have 
no effect on tree length or character state optimiza- 
tion. However, insertions and deletions may rep- 
resent real phylogenetic events and this approach 
ignores their potential contribution to phyloge- 
netic reconstruction. An alternative is to treat in- 
sertions and deletions as separate characters, but if 
they vary in length, one will encounter problems in 
establishing their homology and the transformation 
series among them. One's choiceof approach should 
be governed directly by the data. 

Outgroup Selection . — Outgroups may be used 
to determine character polarity or to root unrooted 
trees following a parsimony analysis (Watrous and 
Wheeler 1981; Donoghue and Cantino 1984; 
Maddison, Donoghue, and Maddison 1984). For 
Hymenoptera, selection of an outgroup for taxa at 
the rank of subfamily or above is often problem- 
atical. For example, although the Symphyta are 
perhaps best thought of as a basal paraphyletic 
group within the Hymenoptera, there are several 
competing hypotheses of relationships among 
symphytan groups (Ross 1937; Konigsmann 1977; 
Rasnitsyn 1980, 1988; Gibson and Goulet 1988). 
These alternative hypotheses affect both the choice 
of an outgroup for the remaining Hymenoptera 
(the Apocrita) as well as hypotheses of character 
state evolution within various symphytan lineages. 
Within the Apocrita, relationships among the non- 
aculeates are particularly problematical. Recent 
suggestions (Rasnitsyn 1988; Mason, unpublished 
data) that the Aculeata are the sister group to the 
Ichneumonoidea would have a significant impact 
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on character polarities within those groups, but 
this hypothesis remains relatively untested. Simi- 
lar problems are apparent for larger groups 
throughout the order. Indeed, at the ordinal level, 
there is virtually no agreement on the appropriate 
sister group to the Hymenoptera as a whole. Until 
higher level relationships among the Insecta are 
better known, the choice of an appropriate outgroup 
will continue to be a problem for studies of phy- 
logenetic analyses within Hymenoptera, regard- 
less of the type of evidence used. 

Why might the choice of outgroup be critical 
when using molecular data for phylogeny? Wheeler 
(1990) has recently discussed some of the problems 
posed by distant or uncertain outgroups when 
using molecular data. If distantly related taxa are 
used as outgroups, the probability that sequence 
similarity is due to random identity increases, and 
the chance that any one character is phylogenetically 
informative consequently decreases. If outgroup 
taxa are sufficiently divergent, polarization of 
characters essentia 1 ly becomes ra nd om . In essence, 
results become phenetic, rather than phylogenetic. 

How can this affect results of parsimony 
analyses? If sequences are too divergent, an ingroup 
may not be resolved as monophyletic relative to 
multiple outgroups. We encountered this problem 
when using two Diptera, Aedes and Drosophila, as 
outgroups to Hymenoptera (Derr et al., in press). 
The two dipterans were nearly as divergent from 
each other as they were from some of the Hym- 
enoptera, resulting in instability at the base of the 
tree. In fact, in this case Hymenoptera could not be 
resolved as monophyletic relative to Diptera, dearly 
an unsatisfactory result. 

Assessing the Reliability of Results . — Bootstrap 
methods in combination with parsimony proce- 
dures have become popular in recent years as a 
way to assess the degree of support for a particular 
phylogenetic clade (Felsenstein 1985, 1988). 
Bootstrapping involves random sampling with 
replacement from a set of characters until a new 
character set is formed, equal in number to the 
original set. From this new character set, another 
maximum parsimony tree is estimated. The proce- 
dure is repeated many times (e.g. 100-10,000) and 
a distribution of solutions is obtained. Several as- 
sumptions underly bootstrap analysis (Felsenstein 
1985, 1988). One is that nucleotides evolve entirely 
independently of one another, that DNA initially 
consists of unlinked sequences of nucleotides that 
change at random throughout the molecule. An- 



other is that nucleotides are identically distributed 
for all taxa. The first assumption may be violated 
with hymenopteran mtDNA. Our investigations 
(discussed below) indicate that hymenopteran 
mtDNA is highly AT-rich and that A-T 
transversions are far more likely than other types 
of transversional substitutions. This is a clear vio- 
lation of the equal probability assumption, which 
predicts that only 1/4 of all transversions should 
be A-T transversions. Another violation of the 
independence assumption arises with sequence 
data if secondary structural constraints in the mol- 
ecule result in compensating substitutions (Wheeler 
and Honeycutt 1988; Simon et al. 1990), such as 
Vawter (1991) found for a relatively small number 
of bases in the stem region of insect rRNA. How- 
ever, with bootstrap one can take these biases into 
account (just as with parsimony analysis) by ap- 
plying, for example, less weight to A-T transversions 
or to sequences with known compensating sub- 
stitutions. The second assumption, that characters 
are identically distributed, poses a difficulty if 
sequences are selected from different regions of the 
genome with different distributions. Also, mixing 
morphological and molecular data in a single 
bootstrap analysis would violate this assumption 
if the two character sets reflect different distribu- 
tions (e.g., continuous and discrete; normal and 
Poisson). Bootstrap percentages are often inter- 
preted as confidence intervals associated with 
particular topologies. However, this is appropriate 
only for testing the validity of a single lineage that 
has been identified in advance of the analysis 
(Swofford and Olsen 1990) and if the above as- 
sumptions are met. Violation of these assumptions 
severely reduces the accuracy of the reported con- 
fidence intervals (Sanderson, 1989). Even though 
the assumptions of the bootstrap may be restrictive, 
it is nonetheless a valuable heuristic method for 
testing the robustness of results from parsimony 
analyses. For example, the appearance of a par- 
ticular group or clade in all or most (e.g. 85-95%) of 
the replicates may be used as an index of support 
for its monophyly (Swofford and Olsen 199()). 

One may also wish to know whether a given set 
of characters support one tree topology significantly 
more strongly than another topology under the 
assumption of parsimony. Templeton's paired 
comparisons test compares two trees in this fash- 
ion. This test is an application of Wilcoxon's non- 
parametric signed-ranks test, using Templeton's 
criteria for nucleotide sequence data (Templeton 
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1983). The scoring procedure involves counting 
the number of substitutions at each informative 
site for two given trees and applying the Wilcoxon 
test to the hypothesis that the total number of 
substitutions is equal for the two trees. Also, ad- 
ditional information can be incorporated into the 
scoring procedure. If, for example, it is known that 
transversions are more common than transitions in 
a given region of DNA, one might choose to give 
more weight to transitions by assigning them a 
higher score than transversions (e.g., transitions = 
2, transversions = 1). Templeton's test is conserva- 
tive, thus it is difficult to reject the null hypothesis 
without large differences in the number of substi- 
tutions between the two trees. This is one of the 
strengths of the procedure. 

Felsenstein has developed a test that uses similar 
data to investigate relationships among sets of 
three taxa, following the statistical approach of 
Ca vender (1981). Cavender pioneered methods for 
applying confidence intervals to phy logenies based 
on parsimony, and Felsenstein (1985) later modi- 
fied Ca vender's methods to include sequence data . 
For a group of three taxa (rooted with an outgroup), 
there are three possible alternative tree topologies. 
Two statistics are used to evaluate whether the 
most parsimonious topology is significantly better 
than the other two at the 95% confidence level: S is 
the number of additional steps that a tree must 
have to be significantly worse than the most par- 
simonious tree; C is the number of phylogeneti- 
cally informative characters that must support a 
tree topology for it to be significantly better than 
the others (Felsenstein 1985). Like Templeton's 
test, this test is conservative; a tree topology that 
differs by only a few steps, or is supported by only 
a few more characters will not be significantly 
different. Felsenstein's test assumes a molecular 
clock (i.e., that the number of changes in a lineage 
is roughly proportional to the amount of time since 
its divergence), a controversial assumption which 
has only just begun to be examined in insects 
(Crozier et al. 1989). 

One final caveat: if data are homoplastic, mul- 
tiple models of character state change may be 
possibleon a given minimum-length tree topology. 
The simplest example is a case in which a parallelism 
or a reversal is equally parsimonious, and either 
may be used to explain the data. Parsimony pro- 
grams contain a number of tools, known as 'opti- 
mization' methods, to assist in modeling character 
state change on cladograms. We caution against 



the use of any one criterion, for example, minimiz- 
ing parallelisms or reversals. In principle, the best 
approach is to determine all of the possible alter- 
native models of character state change for each 
equally parsimonious tree topology. In practice, 
this is feasible only if relatively few characters are 
involved; with sequence data the alternatives are 
likely to be numerous and complex. A workable 
alternative is the use of tree diagnostics, which 
show the minimum and maximum number of steps 
possible in each interval on the tree under all 
possible models of character state change. 

EXAMPLES OF CURRENT RESEARCH 

Tribal Phylogeny of the Family Apidae . — The fo- 
cus of this investigation was to examine the use- 
fulness of DNA sequence data for resolving phy- 
logenetic relationships among tribes of the family 
Apidae. Sequences from the mitochondrial 16S 
rRNA gene were compared in 15 exemplars rep- 
resenting the four apid tribes (Cameron, 1991 ). The 
exemplar approach was justified on the basis that 
the tribes (considered as subfamilies by Michener 
1990) have been recognized as monophyletic 
groups. The use of several taxa (as many as is 
practicable for the study) to represent each clade is 
important for several reasons. First, the use of 
multiple taxa will help to resolve the degree of 
sequence variation exhibited within a given region 
(e.g., variation among species within a tribe or 
variation among tribes), hence assist in the selection 
of regions appropriatefor a given level of inference. 
Second, using multiple exemplars from each clade 
should help to eliminate random error or potential 
biases that could affect the evaluation of alternative 
phytogenies. At least two individuals of each species 
were sequenced as a check against sequencing 
errors and potential intra-specific variation. Se- 
quences were obtained from fresh, frozen, and 
ethanol-preserved tissue. The outgroups for the 
analysis were selected from the subfamily 
Xylocopinae (family Anthophoridae), considered 
to be monophyletic and the closest relatives of 
Apidae (Sakagami and Michener 1987). Two 
outgroups were selected from two different 
xylocopine tribes (Xylocopini and Allodapini). 

Between 500 and 600 bp were sequenced from 
the 3' end of the 1 6S rRNA for all 1 7 taxa . Sequencing 
was accomplished using two primers (Fig. 4; Ap- 
pendix 1) designed to optimize the match between 
published sequences from the 16S mitochondrial 
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Fig. 4. A representation of the mitochondrial 1 6S (large subunit) 
rRNA gene, flanked by transfer RNAs (hatching). The two 
outside arrows correspond to the position and direction of 
extension of the oligonucleotide primers used in PCR and 
sequencing reactions (Cameron, 1991, unpublished data). 
Dotted lines circumscribe the approximate 600 bp region of 
the gene that was amplified with PCR. 

rRNA of the honey bee Apis mellifera L. (Vlasak et 
al. 1987) and partial sequences obtained for other 
apid taxa with the use of 'uni versal' primers (Kocher 
et al. 1989; John Patton, unpublished data). From 
the total number of nucleotides sequenced, 116 
were informative in the sense that at least two 
ingroup taxa shared substitutions at those sites. A 
gap was considered as a fifth character, which did 
not give undue weight to deletions as gaps were 
rare among the informative sites. Length polymor- 
phisms were evident in the 16S rRNA of every 
taxon, but these were not included as characters. 
Transition and trans version substitutions were 
treated with equal weight in the analysis. The 
sequences were aligned by hand and checked by 
computer alignment using the Treealign Computer 
Program (Hein 1990a). The issues of length poly- 
morphisms, character weighting, and alignment 
are treated in detail above (see Phylogenetic 
Analysis of Sequence Data). 

The 116 informative sites from the 15 ingroup 
taxa and one of the outgroups, Xylocopa virginica 
(L.), were analyzed using maximum parsimony 
techniques implemented in PAUP (Version 3.0L, 
Swofford 1990). Maximum likelihood (Felsenstein 
1981) and bootstrap analyses (Felsenstein 1985) 
were implemented as heuristic methods to test for 
the reliability of the results based on maximum 
parsimony. Two equally parsimonious trees were 
produced (Figs 5 A, 5B). In tree A, Apini + Euglossini 
comprise one clade and Bombini + Meliponini 
comprise a second clade. In tree B, the Bombini + 
Meliponini clade is retained, with Euglossini as its 
sister group. The results are consistent with 
monophyly of each of the currently recognized 
tribes, except Bombini, which appears to be 
paraphyletic with respect to Meliponini (trees in- 



dicating the monophyly of Bombini were only two 
steps longer). Both bootstrap and maximum like- 
lihood analyses strongly supported the Bombini + 
Meliponini clade. To test for effects of the choice of 
outgroup, an additional outgroup (Allodapini: 
Exoneura) was included in a separate analysis. This 
resulted in two maximum parsimony trees, each 
with the same tribal topology as tree A (Fig. 5). 
Future work should include additional analyses of 
more distantly related outgroup taxa from the 
Anthoporidae. 

The sequence information obtained from the 
16S region had some interesting characteristics, 
including a higher proportion (> 80%) of A and T 
bases and a correspondingly high number of 
transversion-substitutions. Length polymorphisms 
almost exclusively comprised strings of A's and 
T's. The occurrence of large insertions and dele- 
tions resulted in the exclusion of sections of the 
sequences from the analysis because of question- 
able alignment. Nonetheless, this region was suffi- 
ciently conserved overall to be useful for resolving 
relationships at the tribal level. The instability of 
the apini branch can probably be resolved by in- 
cluding sequence information from additional 
representatives of the Euglossini. Because of space 
considerations, the aligned sequences and infor- 
mation regarding percent sequence divergence. 
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Fig. 5. The two most parsimonious trees (A and B) for the 
tribesof Apidae, inferred using the branch and bound method 
implemented in PAUP (from comparisons of nucleotide 
sequences of mtDNA [165 rRNA] from 16 taxa). The trees are 
simplified to show only the tribal topology. The outgroup is 
Xylocopa virginica (Anthophoridae). Tree length for analyses 
of 116 informative sites in 16 taxa was 304 steps, resulting in 
a consistency index of 0.533. 
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base composition, and base distribution have been 
omitted and will be presented elsewhere (Cameron, 
unpublished data). 

Species Phytogeny for the Genus Apis . — The same 
sequences from the 16S mitochondrial rRNA sub- 
unit (500-600 bp) discussed above were used in a 
separate analysis of five species of the genus Apis. 
These included A. mellifera, A. cerana F., A. 
koschevnikovi Buttel-Reepen, A. dorsata F., and A. 
florea F. One or more exemplars were selected from 
each of the three remaining apid tribes (Meliponini, 
Bombini s.s., and Euglossini) and Xylocopinae (X. 
virginica) to serve as outgroups. This study repre- 
sents a case in which comparative sequences for a 
given region are useful for two different levels of 
analysis. From the original data set (above) there 
were 36 informative sites within Apis. Maximum 
parsimony trees, based only on the informative 
sites, were estimated in separate analyses using 
each of the outgroups. Two equally parsimonious 
ingroup trees were produced: (Figs 6A, 6B). Tree A 
is concordant with recent analyses based on mor- 
phology (Alexander 1991) and comparative se- 
quences from the mitochondrial subunit II of the 
cytochrome-oxydase gene (COII) (Garnery et al. 
1991). A well-corroborated pattern of this nature, 
utilizing three independent data sets, is highly 
desirable for two reasons: (1) it suggests a high 
level of reliability in the phylogenetic pattern, and 
(2) offers strong support for the acceptance of 
hypothesis A over hypothesis B (Fig. 6). A complete 
discussion of these results will appear elsewhere. 

Relationships Among the Higher Levels of Hym- 
enoptera: mtDNA. — The focus of this study was to 
examine the phylogenetic utility and the degree of 
resolution provided for various hierarchical levels 
within Hymenoptera by nucleotide sequence in- 
formation from the 16s rRNA region of the mito- 
chondrial genome (Derr et al., in press). Repre- 
sentative DNA sequences from two members of 
the suborder Symphyta (superfamilies Siricoidea 
and Tenthredinoidea) and seven from the subor- 
der Apocrita (superfamilies Ichneu-monoidea, 
Chalcidoidea and Vespoidea) were examined and 
compared. In addition, published 16s rRNA se- 
quences from Aedes (HsuChen et al. 1984) and Apis 
(Vlasak et al. 1987) were included in the analysis. 
Multiple individuals and clones were sequenced 
from each taxon. We were able to obtain usable 
sequences from specimens killed and preserved in 
70% ethanol. Sequences from smaller species 
(Aphytis, Aphelinidae) were obtained from pro- 
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Fig. 6. The two most parsimonious trees (A and B> for Apis, 
inferred using the exhaustive search method implemented in 
PAUP from comparisons of nucleotide sequences of mtDNA 
(16S rRNA) from6 taxa. The outgroup is Xylocopa virginica. Tree 
length for analyses of 6 taxa was 118 steps for 36 informative 
sites, resulting in a consistency index of 0.643. 

geny of single females (isolines). Details regarding 
DNA isolation, PCR, cloning and sequencing will 
be provided elsewhere (Derr et al. in press). 

Following computer-assisted alignment of the 
sequences, a total of 573 nucleotide positions was 
reported with 287 variable in two or more taxa. 
Each of these sequences was characterized by nu- 
merous insertion/deletion events and a bias for A 
and T bases (cf. Fig. 3). Percent A and T ranged 
from 0.533 to 0.794, with sequences from members 
of Ichneumonoidea and Chalcidoidea displaying 
significantly lower A and T averages. Moreover, 
sequences from both of these superfamilies also 
exhibited significantly more strand asymmetry, 
with an unequal number of purines (A and G) or 
pyrimidines (T and C) on each DNA strand. 

Pairwise comparison among all taxa revealed 
percent sequence differences ranging from a low of 
< 2.5% to a high of slightly over 50%. These se- 
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quences were analyzed using maximum parsi- 
mony and bootstrap procedures available in PAUP 
(Swofford 1990). This analysis, which included 
rooting with 16s rRNA sequence from the dipteran 
Aedes, resulted in a single most parsimonious tree 
(Derr et al. in press). A bootstrap consensus tree 
derived from 100 replications had an identical 
topology. T wo major groups of hymenopteran taxa 
emerged; the first included the symphytans and 
the aculeates, the second comprised the parasitic 
Hymenoptera. However, examination of less par- 
simonious trees revealed another solution only 
two steps longer (out of 703 total steps) in which the 
Symphyta form a basal grade to a monophyletic 
apocritan clade (aculeates plus parasitic Hym- 
enoptera). All internodes were well supported in 
the bootstrap analysis with the notable exception 
of those leading to the Symphyta and the aculeates. 

An additional analysis, using one of the two 
symphytans as an outgroup to the apocritan taxa, 
also resulted in a sister group relationship between 
the aculeates and the parasitics. This confirmed the 
instability at the base of the tree and suggested that 
a high level of sequence divergence precludes us- 
ing this region for resolving relationships at the 
subordinal level. Nevertheless, these results sup- 
port both the aculeates and at least these parasitic 
Hymenoptera as distinct monophyletic groups, 
and provided baseline information regarding the 
amount and type of nucleotide sequence informa- 
tion available from this region. Interestingly, the 
sister group relationship between Ichneumonoidea 
and Chalcidoidea is probably the most strongly 
supported result to emerge from the analysis. 
Among the parasitics, the three representatives of 
the Ichneumonoidea form a monophyletic group. 
However, sequence divergence among the 
ichneumonoids was low, providing little resolu- 
tion among the terminal taxa. Conversely, the two 
chalcidoid sequences examined, both from the 
genus Aphytis, clearly represented a monophyletic 
group and they are very divergent from one an- 
other, suggesting that this region may have con- 
siderable utility at the species level. Baseline in- 
formation of this type allows subsequent investi- 
gations to focus on areas of the genome most likely 
to produce phylogenetically useful information. 

Relationships Among the Higher Levels of Hym- 
enoptera: Nuclear rRNA. — An exploratory study to 
examine the the usefulness of partial sequences of 
the small-subunit rRNA for higher-level phyloge- 
netic applications within the Hymenoptera has 
recently been completed (Austin etal. unpublished 



data). Although the final results of our investiga- 
tion are not yet available (and will be published 
elsewhere), some points can be made that should 
prove useful to workers who are interested in the 
molecular systematics of Hymenoptera. 

We wished to examine three hypotheses; the 
paraphyly of the Symphyta, the basal position of 
the Stephanidae to the rest of the Apocrita, and the 
sister-group relationship between the Ichneu- 
monoidea and Aculeata (see Whitfield this issue 
for more information and references to these hy- 
potheses). Initial trials were made with five diver- 
gent taxa {A. mellifera, Perga dorsalis Leach, Sirex 
noctilio F., Megarhyssa nortoni (Cresson) and Ibalia 
leucospoides Hochenwarth). Multiple species from 
some of these five lineages were examined to check 
the reliability of these data and to test the various 
methods of analysis against confirmed monophyl- 
etic groups. Overall for the ingroup, we collected 
sequence information from three ichneumonoids 
(two ichneumonids and a braconid), two pergid 
sawflies and two aculeates (A. mellifera and 
Myrmecia sp.). Because the sister group to the Hy- 
menoptera is unknown, we employed multiple 
outgroups: Drosophila, Artemia (published se- 
quences, Dams et al. 1988), and two species of 
water beetle (newly sequenced as part of another 
study). 

Results obtained using three commercially 
available universal primers for the 18S subunit 
rRNA (A, B and C , Field et al. 1 988; Ba verstock et al. 
1991a) revealed a mean sequence divergence of 
about 5% among the taxa. Two other primers (D 
and E, Baverstock et al. 1991a, 1991b), reportedly 
specific to more variable regions of the 18S subunit 
(Baverstock et al. 1991b), yielded sequences with 
3.3% to 17% divergence. These regions proved too 
conservative to test the above hypotheses. Addi- 
tional sequence information collected from six other 
species basically confirms the high degree of con- 
servation within the small-subunit ribosomal RN A, 
a result which is consistent with those of Sheppard 
and McPheron (1991) for the Apidae. 

It is our opinion that sequences from the large- 
subunit (28S) rRNA, which have been useful in a 
preliminary investigation of higher level apid re- 
lationships (Sheppard and McPheron 1991), 
combined with mtDNA and nuclear DNA se- 
quences obtained with PCR technology, will be 
most fruitful for examining hypotheses of rela- 
tionships among suborders and families of the 
Hymenoptera. 
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APPENDIX 1. MITOCHONDl^AL DNA PRIMERS 

The following primers are based on hymenopteran mtDNA sequences and have been 
successfully employed on a range of species (Cameron 1991; Cameron unpublished data). 
All primers are written in the 5' to 3' direction. Primers are named for the gene in which they 
are located (e.g. 16S), their relative position in the gene (m=mid, l=low), and whether they 
prime in the forward (F) or reverse (R) direction. Two nucleotides at a single position (one 
below the other) represent a degenerate site (a nucleotide site occupied by more than one 
nucleotide). Degeneracy in the primer allows for some degree of mismatch between the 
primer and its complementary target. 



16S rRNA Primers 
875-16SmF (24mer) 

Apis 5'-TTATTCACCTGTTTATCAAAACAT-3' 

874-16SIR (20mer) 

Apts 5'-TATAGATAGAAACCAATCTG-3' 

C 

16SmR (20mer> 

Cotesia 5-CAGGTGAATATAAATTTGCC-3' 

(Braconidae) 

12S rRNA Primers 



12SmF (20mer) 
Bomhus 



5-CTTATTAGAGAAACTTGTAG-3' 



